A New Hybrid De Novo Sequencing Method For Protein Identification
نویسندگان
چکیده
Tandem mass spectrometry is a powerful tool for studying proteins. However, an open problem for proteomics research is how to accurately identify proteins from the experimental mass spectra. De novo sequencing based protein identification is the only feasible approach for finding new proteins and studying protein post-translational modifications. In this paper, we describe our novel hybrid de novo sequencing based protein identification method. It differs from existing methods which rely on finding one maximum path from a spectrum graph. Instead, to identify peptides, our method applies a novel Bayesian network and dynamic programming hybrid algorithm to explore the sub-optimal space. Thus our method can better accommodate various interferences and artefacts present in the mass spectra. Evaluated on a large number of spectra, our method outperforms the most popular de novo sequencing methods and can significantly improve the accuracy of de novo sequencing based protein identification. Keywords-Protein identification, de novo sequencing, Bayesian network, dynamic programming, proteomics.
منابع مشابه
Clustering of Short Read Sequences for de novo Transcriptome Assembly
Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...
متن کاملDe novo identification of repeat families in large genomes
MOTIVATION De novo repeat family identification is a challenging algorithmic problem of great practical importance. As the number of genome sequencing projects increases, there is a pressing need to identify the repeat families present in large, newly sequenced genomes. We develop a new method for de novo identification of repeat families via extension of consensus seeds; our method enables a r...
متن کاملA Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS
1. Introduction To identify proteins, a de novo sequencing algorithm computes the peptide sequences from MS/MS data without the need of a protein database. When proteins are heavily modified or from an organism whose genome is not sequenced, de novo sequencing is the only reliable approach to identify the proteins in a sample. De novo sequencing typically requires higher quality data than those...
متن کاملOptimization algorithm for de novo analysis of tandem
Protein identification is usually achieved by tandem mass spectrometry (MS/MS). Because of the difficulty in measuring complete proteins using MS/MS, typically a protein is enzymatically digested into peptides and the MS/MS spectrum of each peptide is measured. The database searching methods are predominant in the task of peptide identification. Their aim is to find the best match between model...
متن کاملIdentification of phosphopeptides with unknown cleavage specificity by a de novo sequencing assisted database search strategy.
In theory, proteases with broad cleavage specificity could be applied to digest protein samples to improve the phosphoproteomic analysis coverage. However, in practice this approach is seldom employed. This is because the identification of phosphopeptides without enzyme specificity by conventional database search strategy is extremely difficult due to the huge search space. In this study, we in...
متن کامل